Books Corpus
https://paperswithcode.com/dataset/bookcorpus
(積ん読)
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books (2015)
で提案
https://yknzhu.wixsite.com/mbweb
https://huggingface.co/datasets/bookcorpus
から使うのがよさそう
load_dataset
の例がbookcorpus指定
https://huggingface.co/docs/datasets/loading.html#slice-splits
取得するためのコード例
https://github.com/soskek/bookcorpus
These are scripts to reproduce BookCorpus by yourself.
However, BookCorpus is no longer distributed...
他の取得方法も案内されている